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Introduction 


When some aberrant cells proliferate unchecked in the brain, the result is a tumour. Brain tumours 
come in a wide variety of forms, which are broadly divided into two groups [8]. Surgical removal of 
benign (non-cancerous) brain tumours is typically less invasive and more manageable than that of 
malignancies [9-13]. These tumours grow slowly and usually stay isolated from normal brain tissues 
around them. Brain tumours, whether malignant or benign, can sometimes blend in with their normal 
surroundings [14]. Consequently, removing them completely without harming the brain tissues around 
them might be challenging at times [15-19]. 


Imaging the human body or an internal organ using magnetic resonance imaging (MRI) is a 
biomedical imaging technique that captures and generates images of the structure and anatomy [20- 
25]. The combination of radio waves and powerful magnetic fields allows MRI scanners to produce 
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pictures of the inside organs of the body. Every one of the several MRI sequences has a specific 
purpose [26-31]. T1, TIC, T2, and FLAIR are among the most frequent. The signal intensity of T1C 
images is higher than that of other MRI sequences because of their increased contrast. There is a cost 
to using technique to precisely locate the tumor's limits. Because they are so imperceptible to the 
human sight, tumour borders can only be anticipated using computer vision [32]. 


Early detection and treatment of cancer depend on being able to pinpoint the exact location of tumours 
and classify them according to their specific characteristics [33]. The exact location of the tumour can 
also reveal its cause, which could be related to things like mental stress, a genetic predisposition to 
brain cancer, or exposure to extremely high doses of ionising radiation. Including these details with the 
patient's symptoms considerably improves survival chances [34-41]. 


Computer algorithms that learn better on their own are the focus of machine learning (ML) [42]. The 
field considers it a subset of AI. A mathematical model is constructed by machine learning algorithms 
using sample data, often referred to as "training data," in order for them to autonomously generate 
predictions or judgments [43-47]. Email filtering and computer vision are only two examples of the 
many uses for machine learning algorithms, which are employed when traditional algorithm 
development becomes too tedious or unfeasible [48]. 


Predictions made by computers are the main subject of machine learning, a subfield of computational 
statistics. Machine learning draws theory, methodology, and application domains from mathematical 
optimization [49-53]. Data mining is an associated discipline that uses unsupervised learning for 
exploratory data analysis. Machine learning is also known as predictive analytics when it is used to 
solve various business challenges [54]. 


Neural networks are computer systems that take their design cues from the intricate web of 
connections that make up the brains of all living things, most notably humans. By analysing given 
instances, these systems "learn" to carry out tasks [55-59]. As a result, individuals are able to complete 
jobs even when not given explicit instructions. Their method is based on autonomous rule inference. 
Training starts with humans providing relevant data, then iteratively forms rules by comparing the 
input data with the output. If you give it a large and diverse dataset to train on, the rules it produces 
will be more accurate [60-64]. 


The building blocks of artificial neural networks are synthetic neurons that mimic the behaviour of real 
neurons by taking in data, processing it using an internal state (activation function) and a threshold, 
and finally producing an output value. Data that humans can understand, such pictures and documents, 
are the first inputs. A mathematical function is what we mean when we talk about an artificial neuron 
[65-71]. A non-linear function, such as an activation or transfer function, is used to add the individual 
weights of each input. The notion of thresholds underpins these activation functions; a neuron is 
engaged when the total surpasses the threshold value [72-75]. A layer is a group of neurons that make 
up a specific level of a neural network. Typically, in a neural network, each layer's neurons take input 
from the layer below, perform some sort of processing on it, and then send the output to the layer 
above it [76-81]. 


Objective 


We provide a method that can automatically detect meningiomas, gliomas, and pituitary tumours from 
a given brain MRI and segment the tumour region accordingly, eliminating the need for a medical 
specialist to do so manually. The two parts of our solution are classification and segmentation. During 
the segmentation step, a Convolutional Neural Network based on U-Net is utilised. The three-stage 
design of U-Net down-sampling (encoding), bottleneck (filtering out unwanted information), and up- 
sampling (decoding) resembles the shape of the letter 'U,’ hence the name. It is a specific kind of 
neural network that assigns a label to each pixel in an image, out of several potential classifications. So 
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that the tagged pixels can be easily located on the image, it also keeps the original image's resolution. 
Therefore, unlike conventional neural networks, the U-Net architecture can identify not only the 
"WHAT," but also the "WHERE." After the U-Net model segments the tumour, a basic fully 
Convolutional Neural Network is employed to determine the type of tumour. We can't tailor a 
conventional neural network to our requirements without a toolkit or framework that allows us to 
access its low-level APIs. This is why we opted to build our model using PyTorch, a widely used 
library for computer vision and deep learning developed by Facebook AI. 


The following research demonstrates how our research challenge is both novel and pertinent. Writing a 
report on the topic requires knowledge of the current literature and arguments around the subject. 
Research on effective methods for detecting, segmenting, and classifying images has increased at an 
exponential rate in the last several years. In particular, biomedical image segmentation's impact on 
raising health care quality has contributed to its growing popularity [82-87]. It is now possible to 
separate cerebrospinal fluid (CSF) from brain cells using medical pictures and a variety of machine 
learning methods, including fuzzy c-means, Bayesian classification, and expectation maximisation 
(predominantly fat) [88]. Segmenting brain tumour regions from normal tissues in CT images has been 
accomplished using SVM employing the Radial Basis Function. It should be mentioned that MRI 
images carry more detail than CT scans, hence MRI is better for accurately segmenting diseased 
tissues, particularly delicate organs like the brain. Studies have also investigated the use of Artificial 
Neural Networks for feature extraction from brain images with the goal of directly classifying tumours 
into one of several kinds. However, manual segmentation of the tumour region is required. The 
segmentation has also made use of multi-modal MRI scans. There are five 3D volumes in the MR 
picture: Tl, T1+, T2, Flair, and the mask image [89]. The dimensions of each volume are 240 x 240 x 
155. While this dramatically improves prediction accuracy, it is more difficult to acquire and requires 
more GPU VRAM for training because it requires all four MR sequences and associated 3D volumes. 
Both the training time and the difficulty of manual verification are substantially increased as a result of 
this [90-97]. 


Literature Survey 


According to Alexandra Lauric and Sarah Frisk's proposed segmentation method in their paper "A Soft 
Segmentation of CT Brain Data," there is a dearth of literature on CT brain segmentation compared to 
MRI brain segmentation. In most cases, MRI is chosen over CT for brain imaging due to its 
superiority in discriminating soft tissues. Nevertheless, there are situations where magnetic resonance 
imaging (MRJ) is not appropriate, and other scanning techniques must be employed. In order to make 
CT brain images more useful, the authors of this study investigate techniques for soft tissue 
segmentation. The efficiency of current methods for CT image segmentation of brain tissue is also 
taken into account. This model uses three methods to segment brain and cerebrospinal fluid: Bayesian 
classification, Fuzzy c-means, and Expectation Maximization. The results demonstrate the necessity 
for new imaging protocols to enhance CT imaging for differentiating soft tissue detail and to create 
segmentation algorithms that are specific to CT, even though these methods performed better than the 
routinely used threshold-based segmentation. 


An Introduction to Brain Tumor Segmentation in CT Scans According to the model presented by 
Shanmugapriya and Ramakrishnan [1], which is based on the use of a support vector machine 
classifier, medical image processing is an interdisciplinary area that has drawn researchers from many 
different disciplines, including biology, computer science, engineering, applied mathematics, and 
medicine. It goes on to say that new difficulties have emerged as a result of advancements in imaging 
modalities, such as how to handle and interpret massive amounts of images for the purpose of illness 
diagnosis and treatment. Brain tumours in computed tomography (CT) images are segmented using 
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support vector machines (SVMs) in this paper. After comparing the two SVM Kernel functions, the 
RBF-SVM was shown to be the superior choice. 


Data Mining for Brain Tumor Features Siva Sankari et al. [2] developed a model based on magnetic 
resonance imaging (MRI), which allows radiologists to see the body's internal architecture clearly. Its 
superior contrast compared to other medical imaging techniques like computed tomography (CT) or 
X-rays makes it ideal for imaging the brain, muscles, heart, and malignancies, among other soft tissues 
of the body. Using k-means Clustering Segmentation and the Gabor feature extraction algorithm, they 
are able to extract the best features of brain tumours from this MRI. Unchecked tissue growth is a 
hallmark of brain tumours. Early detection is key because it is easily curable. 


Using k-means clustering methods for medical imaging applications and an automatic support system 
for stage classification using artificial neural networks, Sawakare and Chaudhari [3] propose 
Classification of Brain Tumor Using Discrete Wavelet Transform, Principal Component Analysis, and 
Probabilistic Neural Networks. Because of the unique nature of brain tumour cells, finding these 
tumours is no easy task. Using the k-means clustering algorithm, which is presented in this study, 
magnetic resonance images can be segmented to assess anatomical features and detect brain cancers in 
their early stages. Also mentioned is the plan to employ the artificial neural network to determine if 
brain tumours are benign, malignant, or normal based on their stage. The segmentation results will 
also form the basis of a CAD system that may detect brain tumours early, increasing the patient's 
chances of survival. In order to remove tumour tissues from MR images, this research details an 
effective automated brain tumour segmentation method. This technique improves upon previous 
approaches to segmentation by employing a K-means clustering algorithm. Classifying different types 
of tissues such as White Matter (WM), Grey Matter (GM), Cerebrospinal Fluid (CSF), and 
occasionally diseased tissues like tumors is a well-known challenge in magnetic resonance imaging 
(MRI) segmentation. Automated brain tumour classification will be implemented using a radial basis 
function Probabilistic Neural Network. Features were extracted using GLCM and PCA, and then the 
PNN-RBF network was used for classification. Nevertheless, the decision-making process was split 
into two steps. Both the training performance and the classification accuracies were used to assess the 
performance of this classifier. Based on the simulation results, it is clear that the classifier and 
segmentation algorithm outperform the previous method in terms of accuracy. 


Gliomas are primary brain tumours that originate from glial cells, according to Cho and Park's [4] 
classification of low-grade and high-grade gliomas utilising multi-modal imaging radionics 
characteristics. The World Health Organization (WHO) has developed a grading system for 
malignancy that gliomas can be categorised into histopathologic categories. This research presents a 
strategy for predicting glioma grades using data from radiomic imaging. This work utilised the training 
data, segmentation ground truth, and ground truth labels from the MICCAI Brain Tumor Segmentation 
Challenge (BRATs 2015). Each FLAIR, T1, T1-Contrast, and T2 image had 45 radiomics features 
derived from shape, histogram, and gray-level co-occurrence matrix (GLCM) used to characterise 
glioma qualities. Among 180 features, the ones that were deemed significant were chosen using L1- 
norm regularisation (LASSO). This model used logistic regression to categorise gliomas as either low- 
grade (LGG) or high-grade (HGG) based on the LASSO coefficient and the feature values that were 
chosen. An outcome of categorization was confirmed using a 10-fold cross-validation. Area under the 
curve (AUC) = 0.8870, specificity = 0.9074, accuracy = 0.8981, and sensitivity = 0.8889. 


In their study, Kolarik et al. [5] suggested a 3D Dense-U-Net neural network architecture with densely 
linked layers for 3D brain tissue segmentation on MRI data. The method is totally automatic and uses 
a recent deep learning approach. With an accuracy of 99.70 percent on testing data, our strategy beat 
human expert results, and it is capable of exact segmentation without any preprocessing of the input 
image, unlike many earlier methods. You may easily use the architecture suggested in this paper to any 
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project that uses a U-net network as a segmentation method to improve its results. The TensorFlow 
backend was utilised to implement this model in Keras. 


Proposed Model BTSC 


Unlike CNN, the u-net model employs three distinct steps to safeguard features from being lost during 
convolution and max pooling. Each step involves down-sampling, encoding, bottleneck, then up- 
sampling again (Decoding) [98]. By utilising a symmetric architecture, the features acquired from one 
encoding layer are copied and joined with those from the last decoding layer, the second encoding 
layer with those from the second to last decoding layer, and so on. A key benefit of the U-net model is 
its ability to be trained rapidly and accurately with little datasets, including 2D datasets that are 
missing axial information [99-102]. This is mostly because it employs a minimal number of layers 
while accurately predicting the segmentation of any abnormal region by combining the location data 
acquired from the decoding path with the contextual information collected from the encoding path. 
Biomedical image segmentation is the primary use of this variant of neural networks [103-115]. 


A completely connected neural network that exclusively executes convolution and max-pooling 
operations is called a fully convolutional neural network (FCNN). This picture classification system is 
based on the Multi-Layer Perceptron (MLP) idea, where the flattened matrix passes through a fully 
linked layer. Typically, an activation layer utilising a function like sigmoid, soft-max, etc., makes up 
the last layer [116]. 


Problem Statement 


The intricate anatomy of the brain, which can differ from one individual to another, makes tumour 
detection a difficult undertaking. One possible use of magnetic resonance (MR) imaging is the 
detection of brain tumours. It can be a tedious and time-consuming operation to accurately segregate 
tumour areas, though. The majority of the time, manual tumour boundary segmentation and tumour 
type classification are erroneous [117-121]. 


Collecting the dataset needed to train the model is the first step. For this project, we are using T1C (T1 
sequence contrast-enhanced) MR images of the brain as our dataset. Magnetic resonance imaging 
(MR) can distinguish between cerebrospinal fluid (CSF) and adipose tissue, allowing for more precise 
localization of tumours [122]. Multiple features of the tumour area, including volume, shape, region, 
and orientation, allow for the categorization of brain tumour types. In order to obtain high accuracy 
while in production mode, it is important to employ a correct and legitimate dataset. Three pieces of 
information the original image, the ground truth, and the brain sample itself make up the dataset (mask 
image and tumour type) [123-125]. 


Prior to training the neural network model, the dataset must be preprocessed. Steps include 
transforming the image from RGB to grayscale, applying augmentation techniques to compensate for 
changes in image properties throughout production mode, and shrinking the image to a specific 
dimension that the model can handle [126]. 


Each picture pixel is either white (representing the non-tumor region) or black (representing the 
tumour region), and the U-Net model learns to distinguish between the two using a variety of 
characteristics (tumor region). Each pixel is assigned a value from a set of possible values in a process 
called semantic segmentation [127]. Each of the three pieces that make up U-CNN Net's model is 
actively working. Both downsampling and upsampling are involved in these processes. The model is 
able to keep track of the tumour region's location and context thanks to these processes. In other 
words, training the model with a small number of samples yields remarkably accurate predictions in 
both the testing and production phases. By utilising augmentation approaches that take into 
consideration the non-uniformity of the brain samples during manufacturing, the requirement for the 
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3D data volume and the number of MR sequences per sample is significantly reduced to just the TIC 
sequence. 


This project's U-Net architecture trains itself to make accurate predictions using only 2D MR images 
of the TIC sequence. Not to mention that fewer samples are needed. Our design for this project 
includes five levels of downsampling, one layer of bottleneck detection, five layers of upsampling, and 
an output layer activated using a sigmoid function. 


Following the U-Net model's output of a tumour mask, the second component of the programme is an 
FCNN whose job it is to determine the tumour kind. It is a basic fully-connected neural network that 
uses max-pooling and convolutional layers. In the last layer, a soft-max activation function estimates 
the likelihood that the tumour is a meningioma, glioma, or pituitary tumour, and produces a one- 
dimensional matrix with three elements per element. Blending or overlaying the U-Net mask image 
with the original image preserves the tumour region's localization information to the brain's 
orientation, which improves accuracy. 


Following two sets of convolutional and max pooling layers, three linear dense layers, and the soft- 
max function as the output layer make up the FCNN architecture utilised in this application. It uses 
considerably less VRAM during back-propagation and takes substantially less time to train and 
produce because to its minimal number of layers and nodes. 


The user can easily download the original image, created mask, and anticipated tumour type to their 
local system in one of the image formats with the click of a button. This report is organised using 
image processing techniques. This desktop app is compatible with both Mac and Windows computers 
and also makes use of Electron JS. 


Development, training, and application testing are only a few of the phases of the development process 
that include software and hardware. The application's input data and its generation of output must also 
comply to certain limitations, among other requirements. 


The process of gathering user needs and creating an app to address those needs is known as 
requirement analysis. Limitations on user-requested hardware and input data are the primary 
requirements during production mode. During acceptance testing, the application's output data is 
checked for accuracy. In what follows, we'll break down these prerequisites. 


There are two main types of data that may be used with machine learning models: train/test data and 
live data. Due to the fact that they are essentially two halves of the same dataset, the train and test sets 
typically share restrictions. In a production setting, the user provides the live data, which is subject to 
different restrictions. 


Result 


Both the development and production modes share the same output data. The tumour mask picture and 
the tumour type determined from the mask image make up the output data supplied by the models in 
our application. The user is thereafter presented with multiple image formats from which to choose 
when downloading the diagnostic report. The report includes a tight-layout image that the user may 
save to their PC with the press of a button. It combines the original image, mask image, and tumour 
kind. Two main deep learning modules make up the suggested system: (A) U-Net model for 
segmenting tumour regions and (B) FCNN for classifying tumour types. 


Unlike CNN, the u-net model employs three distinct steps to safeguard features from being lost during 
convolution and max pooling. Each step involves down-sampling, encoding, bottleneck, then up- 
sampling again (Decoding). By utilising a symmetric architecture, the features acquired from one 
encoding layer are copied and joined with those from the last decoding layer, the second encoding 
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layer with those from the second to last decoding layer, and so on. The feature extraction process can 
keep the image's resolution intact using this copy-and-concatenation method. The last layer employs 
sigmoid activation to ascertain if a pixel should be labelled as belonging to the tumour region. 


A key benefit of the U-net model is its ability to be trained rapidly and accurately with little datasets, 
including 2D datasets that are missing axial information. The reason behind this is because it employs 
a minimal number of layers while integrating the decoding path's localization detail with the encoding 
path's contextual information to accurately partition the tumour area. Part of our U-Net design are five 
blocks that encode data, one layer that acts as a bottleneck, and five blocks that decode data. The 
output layer takes the result from the last decoding block and applies a sigmoid activation to it (fig. 1). 


Encoding block 


Figure 1: Architecture of U-Net segmentation module [6] 


In order to train, the network passes a 512 by 512 grayscale picture across each of the model's blocks. 
After that, the final image is run through a basic image processing filter that changes all pixels with a 
value of 0.5 or greater to 1 (the tumour region or white) and all the other pixels to 0 (the background) 
(black or non-tumor region). 


The model's anticipated mask picture is the final product. The loss value is determined by comparing 
this image to the ground truth mask image. Rather of calculating loss values for each individual 
sample, this is done for each batch. To make the most of the gradients of both loss functions, we have 
integrated Binary Cross Entropy (BCE) and Dice Loss into the model. 


The model receives a 512 x 512-pixel, one-color input picture during the creation phase (grayscale). A 
one-dimensional binary mask image is the final product. To bring attention to the tumour area in the 
original MR image, this predicted mask image is subsequently mixed with it using a particular preset 
transparency. The following section of this article will address the FCNN model, which takes this 
combined picture as input. Our approach for classifying the generated tumour segment is a 
straightforward fully Convolutional Neural Network. Two sets of max pooling and convolution blocks 
precede three dense linear layers. The last linear layer produces three output features, one for each 
kind of tumour; these features form the basis of the output layer, which is a soft-max activation of the 
result. This method produces a three-element vector as its output, where each member represents the 
likelihood of a different form of tumour. The result of the categorization is the element with the 
highest probability, which represents the most likely type of tumour. 
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Training phase 


In order to highlight the built tumour region, the original image is merged with the ground truth mask 
image. The model then returns the likelihood of the tumour types based on this image. In a one-hot 
encoded vector, which stands for the ground truth, the real tumour type is represented by one and the 
other two types are represented by zeros. The optimizer receives the loss, which is determined using 
Categorical Cross Entropy (CCE), and uses it to tweak the weights and biases. 


Result and Discussion 


A portion of the dataset was designated for testing purposes, while the majority was used for training. 
A batch size of 6 was utilised for training in order to avoid overfitting the models. This implies that 
after training for every six samples in the dataset, the internal parameters are adjusted using gradients. 
Since the gradients don't need to be recalculated for each sample, this further enhances computing 
efficiency. Each model underwent 100 iterations of training, with a learning rate of le-4 initially and 
80 percent reduced after the model reached a plateau, defined as when the loss did not decrease in two 
consecutive iterations, and the learning rate was multiplied by 0.8. In order to avoid overshooting, this 
method guarantees a gradual descent towards global loss minima. An ADAM optimizer, which stands 
for "ADAptive Moment," was included into the model to merge the strengths of stochastic gradient 
descent with adaptive learning rate. 


The following MatPlotLib plots show the loss matrices of the two models' training processes. Using 
the augmentation done on the training dataset to account for multiple circumstances, the U-Net model 
learns very fast. A GPU from Nvidia, the GTX 1660 Ti, was used for training the models. It took 3 
hours and 21 minutes to train the U-Net segmentation model and | hour and 15 minutes to train the 
FCNN classifier model. At its lowest, the U-Net model lost only 0.053, whereas the FCNN model lost 
no more than 0.138. Both models’ biases and weights were saved as Panther Project files with their 
minimum loss values (.pt extension). When testing or production is underway, the GPU will be loaded 
with these previously saved models. For the purpose of testing the stored models, the 20% test dataset 
was utilised. After applying the segmentation model to 612 randomly selected samples, the 
classification produced an accuracy rate of 93.46% and the dice (or Fl) score was 0.812. The predicted 
tumour location is very close to the ground reality, with an average dice score of 0.812. The average 
classification accuracy is 93.46%, which is impressive but may be even higher with a more 
comprehensive and genuine dataset. Figure 2 below displays the measurements and a randomly 
produced test sample: 


Figure 2: Test result for a randomly picked sample [7] 
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Conclusion 


To solve the challenges of training a supervised neural network model on a small dataset for brain 
tumour segmentation and classification, this project proposes a U-net segmentation model. It is quite 
straightforward to generate the tumour mask image with this model because it is set up to minimise 
training time while still delivering the output image of the original dimension. Even when trained on a 
small dataset of 2D cross-sectional images alone, the model reliably achieves segmentation success 
with an average dice score more than 0.70. The suggested FCNN classifier model has learned the 
majority of the features from the blended original + mask images, as indicated by its accuracy of 
approximately 95%. 2D cross-sectional scans nevertheless lack some of the information available in 
3D volumes, including tumour volume and other critical axial properties. Prediction accuracy could be 
impacted by this. This model has the potential to be enhanced in the future by using 3D volumes as 
input, which would involve stacking 2D slices along the axial axis. This would enable the model to 
make use of the supplementary data. The augmentation methods, custom filter set, and training batch 
sizes are the primary contributions of this work. The system's goal is to shorten training times without 
using 3D datasets. The classifier can be trained with ease by superimposing the mask image on top of 
the original image with a 50% transparency. This allows for maximum feature extraction and learning 
by combining the contextual detail from the original image with the localization detail from the mask 
image. 
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